Scorpion: Explaining Away Outliers in Aggregate Queries

نویسندگان

Eugene Wu

Samuel Madden

چکیده

Database users commonly explore large data sets by running aggregate queries that project the data down to a smaller number of points and dimensions, and visualizing the results. Often, such visualizations will reveal outliers that correspond to errors or surprising features of the input data set. Unfortunately, databases and visualization systems do not provide a way to work backwards from an outlier point to the common properties of the (possibly many) unaggregated input tuples that correspond to that outlier. We propose Scorpion, a system that takes a set of user-specified outlier points in an aggregate query result as input and finds predicates that explain the outliers in terms of properties of the input tuples that are used to compute the selected outlier results. Specifically, this explanation identifies predicates that, when applied to the input data, cause the outliers to disappear from the output. To find such predicates, we develop a notion of influence of a predicate on a given output, and design several algorithms that efficiently search for maximum influence predicates over the input data. We show that these algorithms can quickly find outliers in two real data sets (from a sensor deployment and a campaign finance data set), and run orders of magnitude faster than a naive search algorithm while providing comparable quality on a synthetic data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Extending the Qualitative Trajectory Calculus Based on the Concept of Accessibility of Moving Objects in the Paths

Qualitative spatial representation and reasoning are among the important capabilities in intelligent geospatial information system development. Although a large contribution to the study of moving objects has been attributed to the quantitative use and analysis of data, such calculations are ineffective when there is little inaccurate data on position and geometry or when explicitly explaining ...

متن کامل

Ag-tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments

Range-aggregate queries are popular in many applications in data warehouse environments with large business relational databases. To evaluate these efficiently, several studies on data cubes (such as the aggregate cubetree) have been carried out. In the wellknown aggregate cubetree, each entry in every node stores the aggregate values of its corresponding subtree. Therefore, range-aggregate que...

متن کامل

Answering Approximate Range Aggregate Queries on OLAP Data Cubes with Probabilistic Guarantees

Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems (DSS). Traditionally, sampling-based techniques have been proposed to tackle this problem. However, its effectiveness will degrade when the underlying data distribution is skewed. Another approach based on the outlier management can limit the effect of data skew but fails to...

متن کامل

Aggregation Query Under Uncertainty in Sensor Networks

In sensor networks, aggregation is often used to obtain some form of summary of the sensor values, such as the maximum and the average. When there are hundreds of nodes, it is inevitable that some of these sensor will malfunction and report faulty sensor values or fail to route the value information, adversely affecting the aggregate result. In this paper, we describe a simple method to locally...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 6 شماره

صفحات -

تاریخ انتشار 2013

Scorpion: Explaining Away Outliers in Aggregate Queries

نویسندگان

چکیده

منابع مشابه

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Extending the Qualitative Trajectory Calculus Based on the Concept of Accessibility of Moving Objects in the Paths

Ag-tree: an Index Structure for Range-aggregation Queries in Data Warehouse Environments

Answering Approximate Range Aggregate Queries on OLAP Data Cubes with Probabilistic Guarantees

Aggregation Query Under Uncertainty in Sensor Networks

عنوان ژورنال:

اشتراک گذاری